test: add reproducer for case-insensitive write rejection (same field ID, different column casing) by pandaamit91 · Pull Request #562 · linkedin/openhouse

pandaamit91 · 2026-04-27T20:43:47Z

Summary

We have noticed that OH writes with different column casing already succeed in some cases and we want to validate the existing behavior before applying any fix. This PR does that — it adds characterization tests that document exactly which write paths work today and which ones don't, with an explanation of why.

Key Findings

df.writeTo().append() already works with the default caseSensitive=false setting. Spark's Iceberg integration maps DataFrame column names to table column names case-insensitively at analysis time. The data files are written using the stored
column names, so the commit carries the unchanged existing schema — writeSchema.sameSchema(tableSchema) is true and the server's validateWriteSchema is never invoked. DaliSpark (a wrapper over df.writeTo()) gets this for free.
Explicit column-list SQL INSERT always fails in Spark 3.1 when casing differs — even with caseSensitive=false. Spark 3.1's ResolveInsertInto rule matches the INSERT column list case-sensitively regardless of the config setting (this only
governs column references in SELECT expressions, not INSERT column lists). The unresolved column is silently dropped, causing AnalysisException: not enough data columns.
The server-side normalization fix is scoped to non-Spark clients — Trino DML, direct Iceberg Java API, and plain REST — that send a PATCH body with column names in a different casing than what's stored. Those clients don't go through Spark's
case-insensitive resolution layer, so the server is the only place to normalize.

Changes

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

Test: Adds CaseInsensitiveWriteTest — a mock-based Spark e2e characterization test that establishes a baseline of which write paths already handle case-mismatched column names before any server-side fix is applied.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

… from stored schema Responds to the reviewer's observation that "writes with different casing already succeed." These tests establish the baseline behavior before any fix is applied. Three scenarios are documented: 1. testPositionalInsert_succeedsRegardlessOfStoredCasing Positional INSERT (no column list) never needs to resolve column names, so casing differences are irrelevant. Works unconditionally. 2. testExplicitColumnInsert_succeedsWithDefaultCaseSensitivity INSERT with an explicit lowercase column list (e.g. "id") against a table that stores "ID" succeeds with the Spark default (caseSensitive=false). Spark resolves "id" → "ID" at analysis time, so the server receives the correct casing. This confirms the reviewer's observation. 3. testExplicitColumnInsert_failsWhenCaseSensitiveEnabled The same explicit-column INSERT fails with an AnalysisException when spark.sql.caseSensitive=true. "id" cannot be resolved against "ID" on the client before the request ever reaches the server. Together these tests show: writes already work under the default Spark configuration, but fail once caseSensitive=true is in effect — a gap that exists independently of any server-side schema normalization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… append Add testDataFrameWriteTo_failsWhenCaseSensitiveEnabled to complete the characterization of existing write behavior. With caseSensitive=true, Spark cannot resolve lowercase "test" or ALL-CAPS "TEST" against stored "TeSt", so both writeTo().append() variants throw AnalysisException before reaching the server — documenting the gap that exists regardless of any server-side normalization fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… Spark 3.1 Spark 3.1's ResolveInsertInto rule matches INSERT column list names case-sensitively regardless of spark.sql.caseSensitive. Rename testExplicitColumnInsert_succeedsWithDefaultCaseSensitivity to testExplicitColumnInsert_failsEvenWithDefaultCaseSensitivity and flip its assertion to assertThrows, matching the actual observed behavior. Update class-level Javadoc to reflect the corrected findings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cbb330 · 2026-04-28T01:46:55Z

Thanks @pandaamit91 it is clear the current state of the world.

what is the AI for spark client since the mixed case will throw error before it lands on OH server?

pandaamit91 force-pushed the ampanda/oh-case-insensitive-writes-repro branch from d634c9c to 13395e9 Compare April 27, 2026 21:04

pandaamit91 and others added 2 commits April 27, 2026 14:10

pandaamit91 mentioned this pull request Apr 27, 2026

Normalize write schema casing to table casing at catalog layer (Make OH reads/writes case-insensitive) #558

Merged

17 tasks

cbb330 mentioned this pull request Apr 28, 2026

Case-insensitive OH table reads via targeted Spark analyzer rule #559

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add reproducer for case-insensitive write rejection (same field ID, different column casing)#562

test: add reproducer for case-insensitive write rejection (same field ID, different column casing)#562
pandaamit91 wants to merge 3 commits intolinkedin:mainfrom
pandaamit91:ampanda/oh-case-insensitive-writes-repro

pandaamit91 commented Apr 27, 2026 •

edited

Loading

Uh oh!

cbb330 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pandaamit91 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing Done

Additional Information

Uh oh!

cbb330 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pandaamit91 commented Apr 27, 2026 •

edited

Loading